BANK OF ENGLISH AND BEYOND Hand-crafted parsers for functional annotation

نویسندگان

  • Timo Järvinen
  • TIMO JÄRVINEN
چکیده

The 200 million word corpus of the Bank of English was annotated morphologically and syntactically using the English Constraint Grammar analyser, a rulebased shallow parser developed at the Research Unit for Computational Linguistics, University of Helsinki. We discuss the annotation system and methods used in the corpus work, as well as the theoretical assumptions of the Constraint Grammar syntax. Based on our experience in large-scale corpus work, we argue for a deeper and more explicit, dependency-based syntactic representation. We present a new practical parsing system, the Functional Dependency Grammar parser, developed from the Constraint Grammar system, and discuss its suitability for treebank annotation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic detection of English inclusions in mixed-lingual text with an application to parsing

The influence of English continues to grow to the extent that its expressions have begun to permeate the original forms of other languages. It has become more acceptable, and in some cases fashionable, for people to combine English phrases with their native tongue. This language mixing phenomenon typically occurs initially in conversation and subsequently in written form. In fact, there is evid...

متن کامل

Beyond Skeleton Parsing: Producing a Comprehensive Large-Scale General-English Treebank With Full Grammatical Analysis

A treebank is a body of natural language text which has been grammatically annotated by hand, in terms of some previously-established scheme of grammatical analysis. Treebanks have been used within the field of natural language processing as a source of training data for statistical part og speech taggers (Black et al., 1992; Brill, 1994; Merialdo, 1994; Weischedel et al., 1993) and for statist...

متن کامل

Generalized Higher-Order Dependency Parsing with Cube Pruning

State-of-the-art graph-based parsers use features over higher-order dependencies that rely on decoding algorithms that are slow and difficult to generalize. On the other hand, transition-based dependency parsers can easily utilize such features without increasing the linear complexity of the shift-reduce system beyond a constant. In this paper, we attempt to address this imbalance for graph-bas...

متن کامل

Structural metadata annotation: moving beyond English

The goal of metadata extraction (MDE) is to enable technology that can take raw speech-to-text output and refine it into forms that are more useful to humans and to downstream automatic processes. Starting in 2003, a structural metadata annotation task was defined for English as part of the DARPA EARS Program. A significant new challenge for MDE is the addition of new languages. This paper repo...

متن کامل

Mobile, L2 vocabulary learning, and fighting illiteracy: A case study of Iranian semi-illiterates beyond transition level

As mobile learning simultaneously employs both handheld computers and mobile telephones and other  devices  that  draw  on  the  same  set  of  functionalities,  it  throws  open  the  door  for  swift connection between learners  and teachers. This  study examined and articulated the impact of  the application of mobile devices for teaching English vocabulary items to 123 Iranian semi-illitera...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002